3 research outputs found

    Frequent Item Set Mining Using INC_MINE in Massive Online Analysis Frame Work

    Get PDF
    Frequent Pattern Mining is one of the major data mining techniques, which is exhaustively studied in the past decade. The technological advancements have resulted in huge data generation, having increased rate of data distribution. The generated data is called as a ‘data stream’. Data streams can be mined only by using sophisticated techniques. The paper aims at carrying out frequent pattern mining on data streams. Stream mining has great challenges due to high memory usage and computational costs. Massive online analysis frame work is a software environment used to perform frequent pattern mining using INC_MINE algorithm. The algorithm uses the method of closed frequent mining. The data sets used in the analysis are Electricity data set and Airline data set. The authors also generated their own data set, OUR-GENERATOR for the purpose of analysis and the results are found interesting. In the experiments five samples of instance sizes (10000, 15000, 25000, 35000, 50000) are used with varying minimum support and window sizes for determining frequent closed itemsets and semi frequent closed itemsets respectively. The present work establishes that association rule mining could be performed even in the case of data stream mining by INC_MINE algorithm by generating closed frequent itemsets which is first of its kind in the literature

    Knowledge Discovery in Data Mining and Massive Data Mining

    Get PDF
    Knowledge discovery is a process of non trivial extraction of previously unknown and presently useful information. The rapid advancement of the technology resulted in the increasing rate of data distributions. The data generated from mobile applications, sensor applications, network monitoring, traffic management, weblogs etc. can be referred as a data stream. The data streams are massive in nature. The present work mainly aims at knowledge discovery using data mining and massive data mining techniques. The knowledge discovery process in both the techniques is compared by developing a classification model using Naive bayes classifier. The former case uses Edu-data, a data collected from technical education system and the latter case uses massive online analysis frame work to generate the data streams. Mining data stream is referred as Massive Data Mining. The data streams must be processed under very strict constraints of space and time using sophisticated techniques. The traditional data mining techniques are not advised on this massive data. Therefore the massive online analysis framework is used to mine the data streams. The present work happens to be unique in the literaturein

    Regression Model using Instance based Learning Streams

    No full text
    Data mining is concerned with the analysis of data for finding patterns and regularities in the data sets. Statistics is a mathematical science concerned with the collection, analysis, interpretation or explanation, and presentation of data. Statistics plays a very important role in the process of data mining analysis and equally visualization of data plays a very important role in decision making process. Instance Based Learning Streams is an instance-based learning algorithm used to perform regression analysis on data streams. The algorithm is able to handle large data streams with less memory and computational power. The paper aims at the implementation of Instance Based Learning Streams as an extension to the massive online analysis framework for data stream mining to develop a regression model. The study reveals that the regression analysis could be performed not only on small data sets but also on data streams as in the present case but the method of analysis will be different in the two cases. In the case of small data set the regression models are linear, multiple and polynomial, while in the case of data streams the entire analysis is performed under the massive online analysis framework by taking the two evaluation parameters basic regression performance evaluator and windows regression performance evaluator. This finding is first of its kind in literature
    corecore